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Some  Inequalities  Governing  Optimum  Code 


Let  an  information  source  be  given  which  generates  messages 
consisting  of  sequences  of  letters  (X^,  X^,.,.,X^)  .  Each  letter  X^ 
occurs  with  probability  .  In  practice,  for  example,  X^  may  be 
the  alphabet  and  p^  the  frequencies  of  usage  in  Erglish.  These  letters 
X^  eu:e  to  be  encoded  for  transmission  over  a  communication  channel 
admitting  the  symbols  a  and  3  . 

In  the  present  paper,  we  consider  prefix  code.  In  1952,  using  an 
elegant  combinatorial  approach,  Hoffman  [l]  obtained  an  optimum 
prefix  code  for  the  case  the  symbols  Q  and  3  cost  the  seune.  Later 
Karp  [2]  used  integer  programming  to  obtain  optimum  prefix  code  with 
symbols  of  different  costs.  Here,  we  use  combinatorial  argument  to 
study  the  case  where  the  "a"  costs  d  dollars  and  the  "3"  costs 
d  +  1  dollars  where  d  is  a  positive  integer.  This  case  will  reduce 
to  the  Hoffman's  case  when  d  becomes  infinite,  and  for  d  =  1  ,  approxima 
the  dot-dash  case  of  common  usage. 

A  prefix  code  may  be  described  by  a  tree  as  shown  in  Figure  1. 


Each  terminal  node  is  associated  with  a  letter  .  The  branches 
leaving  each  node  are  labeled  with  names  of  distinct  symbols  a  and 
P  ,  and  the  code  word  of  each  is  found  by  listing  in  the 

order  the  labels  of  the  branches  leaving  the  root  of  the  tree  to  the 


terminal  node  associated  with  X^  .  Thus,  in  Figure  1,  the  code  word 

for  X  is  03  and  the  code  word  for  X,  is  30  .  The  length 

5  D 

I  of  a  letter  X  is  the  sums  of  O's  and  3'8  used  in  the  code 
i  i 

word.  Thus,  X^  and  X^  have  the  same  length  and  X^  is  of  length 
O  +  53  .  The  length  of  X^  is  a  direct  measure  of  its  cost.  Once 
a  tree,  such  as  in  Figure  1,  is  given  as  a  prefix  code,  the  cost  of  the 
code  is  given  by 


(l)  ^  ^i'^i  (i«l , 2, . , . , n ) 
i 

The  problem  of  constructing  a  optimum  prefix  code  is,  with  given  p^, 
to  find  a  tree  such  that  (l)  is  minimum.  Assume  X^  are  indexed  so 
that 


^n  ^  ^n-1  ^ 


Then  for  an  optimum  code,  it  is  necessary 

(5)  <.•••<.  ^2  ^  ^1 

If  (2)  (5)  are  not  satisfied,  we  could  interchange  the  code  words  and 
reduce  the  value  of  (l). 

Let  us  define  of  a  letter  X^  to  be  the  length  of  X^  minus 
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the  last  symbols  In  the  ci  I'.e  word  representing  .  In  Figure  1, 
for  example,  If  we  discount  the  last  symbol  Q  representing  , 
its  i_  is  a  +  23  ,  Sirailarily  the  of  X  is  23  and 

of  X^  is  3  .  Since  the  cost  of  Q  is  an  Integer  and  the  cost 
of  3  is  also  an  Integer,  the  of  X^  and  of  X^  are  edso 

integers.  And  if 


(4) 


then  they  differ  by  at  least  1,  l.e.,  +  1  • 

Assume  the  last  symbol  of  X  is  3  and  the  last  symbol  of  X 

^  o 

is  a  .  As  (4)  implies  +  d  +  1  ^  +  d  ,  we  have  (it  is  clear 

that  (5)  is  true  if  the  last  symbol  of  X^^  is  a  and  that  of  X^ 
is  3  or  the  case  both  X^  and  X^  have  the  same  last  symbol), 


(5) 


Therefore,  for  an  optimum  code,  (4)  implies  (5),  and  (5)  implies 

Assume  now  that  the  optimvim  prefix  code,  there  are  2m  letters 
with  longest  i  .  Then  let  the  2m  letters  with  less  probabilities 

’'am'  °bvlousl;  for 

an  optimum  prefix  code,  these  2m  letters  axe  the  terminal  nodes  of 
the  m  longest  £  .  Disregard  the  rest  of  the  tree  structure  represenLing 
the  optimum  code  for  a  moment;  we  can  symbolically  represent  the  part 


of  the  tree  as  in  Figure  2. 


Figure  2 


Note  the  eurangement  in  Figure  2  is  not  unique.  Any  assignment  of 

X  ,  X  to  the  Q  branches  and  any  assignment  of  X  , 

2m  an-1  m+1  m 

. . .  ,X^  to  the  p  branches  will  have  the  same  total  cost.  We 
shall  study  several,  inequalities  which  permit  us  to  simplify  the  con 
structlon  of  a  prefix  optimum  code.  First,  If 


P2m  ^  Pm.l  "  Pi 

then  we  can  rearrange  Figure  2  in^o  Figure  5  without  changing  the  rest 


•  • 


Figure  5 


-U- 


This  is  because  changing  from  Figure  2  to  Figure  5,  bhe  decrease  in 


cost  Is  p_  .  d  and  the  increase  in  cost  in  (p  +  p  )d.  Therefore, 
an  m+1  1 

if  (6)  is  true,  then  there  exists  an  optimum  prefix  code  in  which  the 
maximum  number  of  longest  I  is  less  than  m  . 

In  particular,  for  m  a  2  then  (6)  becomes 

(V)  P4^P3  Pi 

and  there  is  only  one  I  of  longest  length.  For  this  optimum  code, 
on  that  longest  I  ,  the  two  terminal  nodes  associated  with  it  will  be 
and  .  This  means  and  will  have  the  same  code  word 

except  the  last  symbol  where  X^  has  o  and  X^  has  p  .  In  con¬ 
structing  an  optimum  prefix  code,  we  can  treat  X^  and  as  one 

letter  with  probability  equals  to  the  sum  of  p^  and  p^  as  done  by 
Hoffman  [1], 

Secondly,  if 

(8)  d.pg^  ^  (d  +  l)p^  +  d  p^  , 


then  we  can  change  Figure  2  into  Figure  4  below  without  increasing  the 


Figure  U 
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This  is  because  in  changing  from  Figure  2  to  Figure  4  the  decrease  in 
cost  is  d.pgjjj  and  the  Increase  in  coat  is  dp^  +  (d  +  l)p^  . 

This  means  if  (8)  is  satisfied,  then  there  exists  an  optimum 
tree  in  which  the  maximum  number  of  longest  £  is  less  than  m  . 

In  particular  for  m  a  2  ,  then  (8)  becomes 

(9)  dpj^  ^  (d  +  l)p^  +  dp^ 

and  there  is  only  one  i  of  longest  length.  Again  we  can  associate 

and  with  this  i  and  hence  reduce  the  total  number  of  letters 
by  one.  Note  that  if  (7)  is  satisfied  so  will  be  (9)  so  this  really 
does  not  give  us  any  new  inequaa.ity.  But  if  (6)  is  satlsf  l.>'d,  (8) 
may  not  be.  Third,  if 


(10) 


>  P  ,  +  *'d-l)p.  +  dp, 
^2m  *-  m+1  2  1 


then  we  can  change  Figure  2  to  Figure  5  without  increasing  the  total 
cost 


m 


0 


Figure  5 
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This  is  because  the  right  hand  side  of  (lO)  represent  the  increase  in 
cost,  and  the  left  hand  side  of  (lO)  represent  the  decrease  in  cost 
in  changing  from  Figure  2  to  Figure  5.  For  m  ■  2,  (lO)  will  reduce 
to  (t).  Therefore  if  ('[)  is  satisfied,  we  can  combine  and 

and  regard  them  as  a  single  letter  with  probability  equal  to  the  sum 
of  Pg  and  p^  .  If  in  the  newly  created  n  -  1  letters  X^  ^  , 

^n  2’***'^!  H  ^  ^2  ^  ^1  '  again  combine  X^ 

and  into  one  letter.  This  process  can  be  continued  until 

('0  is  not  true.  Note  that  if  (  )  is  not  satisfied,  the  number  of 
longest  i  may  still  be  one. 

Let  m  ■  2  for  (6),  (8)  and  (lO),  we  have  the  following  inequalities 


(12)  dp^  ^  dp^  +  (d  +  l)p^ 

(15)  dpg  *  (d  +  l)P2  + 


If  anyone  of  (ll),  (12),  or  (I5)  is  satisfied,  then  the  maximum  number 
of  longest  i  is  at  most  2  .  If  we  knew  that  the  number  of  longest 
J  is  exactly  2  ,  then  we  can  combine  X^  and  X^  into  one  letter 
and  Xg  and  X^  into  one  letter  (see  Figure  6),  In  order  to  be  able 
to  combine  X^  and  X^  and  also  X^  and  X^^  ,  we  study  in  more 

detail  the  part  of  the  tree  with  terminal  nodes  X^  and  X^  .  If 
we  study  one  more  level  of  the  part  of  the  tree  containing  X^  and  X, 
and  assume  that  there  is  only  one  longest  £  ,  there  are  only  five 
possible  configurations  as  shown  in  Figure  6,  7,  8,  9,  and  10. 
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1 


In  Figure  6,  we  can  interchange  the  code  words  associated  with 
and  without  changing  the  total  cost,  i.e,  we  can  still  combine 

with  X^  and  X^  with  X^^  ,even  £  is  one.  In  Figure  7  or 
Figure  8,  we  have  written  X^  or  greater,  X^  or  greater.  This  is 
because  we  have  assumed  that  (7 )  is  not  true,  •  ^’or  an  optimum 
code,  we  cannot  assign  a  letter  X^  or  X^  which  has  less  probability 
than  Pg  +  with  a  length  shorter  than  £  of  X^  and  X^  so  that 
the  termlnad  node  associated  with  a  certain  branch  must  be  X^  oj 
letters  of  greater  probabilities. 

In  Figure  7  or  Figure  8,  X^  is  not  in  the  figiure,  but  the  last 
symbol  of  the  code  word  for  X^  must  be  0  as  X^  is  the  letter  with 
the  smallest  probability  not  in  the  Figure  7  or  8.  The  last  symbol  of 


or 

greater 
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the  code  word  for  X,  niay  be  a  or  3  :  we  shall  assiune  it  to  be  a 

4 

In  order  to  be  on  the  safe  side. 

Then  we  can  transfer  the  X,  and  X  into  the  part  of  the  tree 

5  4 

containing  X^  and  X^  in  Figure  7  or  8  and  make  it  look  like  Figure 

11. 


Figure  11 

The  letters  that  originally  combine  with  X^  and  X^^  can  then 
reduce  their  code  word  by  one  symbol,  say  o  ,  to  be  on  the  safe  side. 
These  letters  must  have  probabilites  p^  and  p^  or  greater.  So 
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in  chan^'lng  froir.  Figure  7  or  Figure*  P,  to  Figure  li,  the  reduct  ior.  ir; 
cost  Is  at  least  (p,  +  P^)d  ,  where  the  total  Ir. crease  In  erir.t  i 
at  least  {p  +  p,.  )d  +  (p,  *  p  )d  .  Therefore  i !’ 

i  r  ■) 


p,  1.1 


then  we  cati  change  Figure  '  or  Figure  8  into  Figure  11  with  [,'■  ln'*re’ 

in  cost.  Note  in  Figure  11  ve  ao  combine  with  >.  and  ^  with 

V  .  Consider  Figure  9  'tnd  i'ifpjrt  10.  As  X  is  the  letter  with 
4  h 

srruillest  probability  not  shown  in  the  Figure  9  ttod  10,  the  Ir.st  s^-Tibol 
of  X,  must  be  3  .  The  letters  that  combine  with  X  must  have 

4  U 

H  last  SNinbol  of  o  and  a  probability  of  p^  or  greater.  .io  we  can 
change  Figure  9  tind  Figure  10  into  Figure  12. 


X 


Higure  12 
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then  we  can  chanf^e  Figure  9  or  Figure  10  into  Figure  12  in  which 

combine  with  X.  and  X  with  X  . 

3  2  4 

If  any  one  of  (ll),  (12),  or  (15)  is  satisfied  then  the  maximum 
number  of  £,  is  at  most  two.  If  it  is  two,  then  we  can  combine 
X^  and  X^  and  with  X^  and  reduce  the  number  of  letters.  If 

there  is  only  one  £  ,  then  there  are  only  five  figures  possible 
as  shown  in  Figure  6, 7, 8, 9,  and  10.  So  if  (14)  and  (15)  are 

satisfied,  we  still  can  combine  X^  with  X^  and  X^  with  ,  hence 

reduce  the  number  of  letters. 

In  applying  the  inequality  (v )  to  the  example  d  =  1  in  the  paper 
ly  Karp  [2j,  the  number  of  letters  is  immediately  reduced  by  ^  . 
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